The Applications Of Unsupervised Learning To Japanese Grapheme-Phoneme Alignment
نویسندگان
چکیده
In this paper, we adapt the TF-IDF model to the Japanese grapheme-phoneme alignment task, by way of a simple statistical model and an incremental learning method. In the incremental learning method, grapheme-phoneme alignment paradigms are disambiguated one at a t ime according to the relative plausibility of the highest scoring alignment schema, and the statistical model is re-trained accordingly. On limited evaluation, the learning method achieved an accuracy of 93.28%, representing a slight improvement over a baseline rule-based method.
منابع مشابه
A Comparative Study of Unsupervised Grapheme-Phoneme Alignment Methods
This paper describes and compares two unsupervised algorithms to automatically align Japanese grapheme and phoneme strings, identifying segment-level correspondences between them. The first algorithm is inspired by the tf-idf model, including enhancements to handle phonological variation and determine frequency through analysis of “alignment potential”. The second algorithm relies on the C4.5 c...
متن کاملA Novel Approach to Unsupervised Grapheme–to–phoneme Conversion
Automatic, data-driven grapheme-to-phoneme conversion is a challenging but often necessary task. The top-down strategy implicitly adopted by traditional inductive learning techniques tends to dismiss relevant contexts when they have been seen too infrequently in the training data. This paper proposes instead a bottom-up approach which, by design, exhibits better generalization properties. For e...
متن کاملEfficient Grapheme-phoneme Alignment for Japanese
Current approaches to the grapheme-phoneme alignment problem for Japanese achieve good accuracy, but are extremely computationally expensive. In this paper we evaluate various modifications to previous algorithms for both the alignment and okurigana detection subtasks. The best algorithm achieved accuracy of 96.2% for the combined task on a limited data set, and was significantly more efficient...
متن کاملAutomated Japanese grapheme-phoneme alignment
This paper describes an adapatation of the tf-idf model to Japanese graphemephoneme alignment, without reliance on training data. The tf-idf model is optionally complemented with affixation and conjugation handling modules, and determines frequencies through analysis of “alignment potential”. The proposed system achieved a maximum accuracy of 94.74% on evaluation.
متن کاملA latent analogy framework for grapheme-to-phoneme conversion
Data-driven grapheme-to-phoneme conversion involves either (top-down) inductive learning or (bottom-up) pronunciation by analogy. As both approaches rely on local context information, they typically require some external linguistic knowledge, e.g., individual grapheme/phoneme correspondences. To avoid such supervision, this paper proposes an alternative solution, dubbed pronunciation by latent ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999